NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Selection on X1 + X2 + ⋯ + Xm via Cartesian product trees

https://doi.org/http://dx.doi.org/10.7717/peerj-cs.483

Kreitzberg, Patrick; Lucke, Kyle; Pennington, Jake; Serang, Oliver (April 2021, PeerJ)
null (Ed.)
Selection on the Cartesian product is a classic problem in computer science. Recently, an optimal algorithm for selection on A + B, based on soft heaps, was introduced. By combining this approach with layer-ordered heaps (LOHs), an algorithm using a balanced binary tree of A + B selections was proposed to perform selection on X1 + X2 + ⋯ + Xm in o(n⋅m + k⋅m), where Xi have length n. Here, that o(n⋅m + k⋅m) algorithm is combined with a novel, optimal LOH-based algorithm for selection on A + B (without a soft heap). Performance of algorithms for selection on X1 + X2 + ⋯ + Xm are compared empirically, demonstrating the benefit of the algorithm proposed here.
more » « less
Full Text Available
Performing Selection on a Monotonic Function in Lieu of Sorting Using Layer-Ordered Heaps

https://doi.org/https://doi.org/10.1021/acs.jproteome.0c00711

Lucke, Kyle; Pennington, Jake; Kreitzberg, Patrick; Serang, Oliver (February 2021, Journal of proteome research)
null (Ed.)
Nonparametric statistical tests are an integral part of scientific experiments in a diverse range of fields. When performing such tests, it is standard to sort values; however, this requires Ω(n log(n)) time to sort n values. Thus given enough data, sorting becomes the computational bottleneck, even with very optimized implementations such as the C++ standard library routine, std::sort. Frequently, a nonparametric statistical test is only used to partition values above and below a threshold in the sorted ordering, where the threshold corresponds to a significant statistical result. Linear-time selection and partitioning algorithms cannot be directly used because the selection and partitioning are performed on the transformed statistical significance values rather than on the sorted statistics. Usually, those transformed statistical significance values (e.g., the p value when investigating the family-wise error rate and q values when investigating the false discovery rate (FDR)) can only be computed at a threshold. Because this threshold is unknown, this leads to sorting the data. Layer-ordered heaps, which can be constructed in O(n), only partially sort values and thus can be used to get around the slow runtime required to fully sort. Here we introduce a layer-ordering-based method for selection and partitioning on the transformed values (e.g., p values or q values). We demonstrate the use of this method to partition peptides using an FDR threshold. This approach is applied to speed up Percolator, a postprocessing algorithm used in mass-spectrometry-based proteomics to evaluate the quality of peptide-spectrum matches (PSMs), by >70% on data sets with 100 million PSMs.
more » « less
Full Text Available
Fast Exact Computation of the k Most Abundant Isotope Peaks with Layer-Ordered Heaps

https://doi.org/10.1021/acs.analchem.0c01670

Kreitzberg, Patrick; Pennington, Jake; Lucke, Kyle; Serang, Oliver (July 2020, Analytical chemistry)
null (Ed.)
Computation of the isotopic distribution of compounds is crucial to applications of mass spectrometry, particularly as machine precision continues to improve. In the past decade, several tools have been created for doing so. In this paper we present a novel algorithm for calculating either the most abundant k isotopologue peaks of a compound or the minimal set of isotopologue peaks which have a combined total abundance of at least p. The algorithm uses Serang’s optimal method of selection on Cartesian products. The method is significantly faster than the state-of-the-art on large compounds (e.g., Titin protein) and on compounds whose elements have many isotopes (e.g., palladium alloys).
more » « less
Full Text Available

Search for: All records